[XeGPU] uArch definition (PR 1/N) #2

mshahneo · 2025-03-28T13:16:25Z

Update 2:

After another round of review update the uArch definition in the following way:

Add common 5 APIs for all the interfaces.
Make OpInterfaces more specific (e.g., 2D, 1D Block IO have seperate interfaces).
Make the design more specific to Intel hardware.
Remove the information we don't use, i.e., make the design simpler.

Update:

After the first iteration of review comment, the current PR is update in the following way:

The main pivot for this iteration is the utilities. These utilities are expose via a range of Interfaces (e.g. TileOpInterface, MatrixOpInterface). A specific instruction for a specific Architecture need to implement theses utilities.
The uArch information is now directly embedded in the utility implementation. We are not using .yaml files. However, I haven't removed the .yaml file yet, since, they can still be utilized if choses to do so in the future.
Also, last but not least, the current version provides an end-to-end prototype of how uArch can be used:
-- Full uArch definition, utilities and all (at least for one instruction for now, DPAS).
-- Definition for 2 uArchs (PVC, BMG).
-- A pass to attach the uArch name to the module attribute using DLTI.
-- Shows the usage of the uArch interface to verify XeGPU ops.
-- Few test cases.

P.S. I still kept some old code intentionally, if we want to go back or use some of it. I'll clean up once we agree on the process.

================================

Hi All,

This is the initial PR for uArch definition. The primary purpose for this PR is to have a discussion and finalize the design. So please think of it as an RFC as well.

First target we want to achieve here is to finalize the skeleton:

Skeleton of the YAML file, what information we want in there
Skeleton of Data structures we want to have (e.g., necessary to create utilities to support the need)

Due to this purpose, the main focus point of this PR are the following files:

mlir/lib/Dialect/XeGPU/Utils/intel_gpu_pvc.yaml : holds the YAML definition of uArch (PVC is used for this use case)
mlir/include/mlir/Dialect/XeGPU/Utils/uArch.h : holds the base uArch struct that can be re-used by specialized uArchs such as PVC.
mlir/include/mlir/Dialect/XeGPU/Utils/IntelGpuPVC.h : holds the PVC-specific structs.

mlir/lib/Dialect/XeGPU/Utils/IntelGpuPVC.cpp is added as placeholder for YAML mapping. This file is intentionally not complete. We want to reach a consensus on the YAML, and C++ structures before filling this one in.

Please provide your valuable feedback as we try to finalize this together.

Garra1980 · 2025-03-28T22:13:50Z

mlir/include/mlir/Dialect/XeGPU/Utils/IntelGpuPVC.h

+} // namespace mlir
+
+#endif // MLIR_DIALECT_XEGPU_UTILS_INTEL_GPU_PVC_H
+//===--- IntelGpuPVC.h ---------------------------------------*- C++ -*-===//


this one should be on the top I guess

Ah, yes, I think I auto-added by co-pilot suggestion. Will remove it :)

adam-smnk · 2025-04-01T15:05:22Z

A general question on the structure - what's the benefit of maintaining both yaml and C++ definitions?
Do we expect sth to reuse yaml separately from C++? Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

adam-smnk · 2025-04-01T16:14:30Z

mlir/include/mlir/Dialect/XeGPU/Utils/IntelGpuPVC.h

+  Range systolic_depth;
+  Range repreat_count;
+  Range execution_size;
+  std::map<std::string, uint> ops_per_channel;


Opaque string mapping doesn't feel user friendly. Especially as it seems to be wrapping numerical bit width that could be a numerical key directly - I assume it's some limitation due to using (conversion from) yaml.

matrix_size is even worse offender in that regard. These wrappers are hard to use without yaml definitions which again makes me wonder if we really need to split it into two separate files.

Opaque string mapping doesn't feel user friendly. Especially as it seems to be wrapping numerical bit width that could be a numerical key directly - I assume it's some limitation due to using (conversion from) yaml.

You are right, the limitation is due to the yaml mapping.

matrix_size is even worse offender in that regard. These wrappers are hard to use without yaml definitions which again makes me wonder if we really need to split it into two separate files.

Sorry, the matrix size is a mistake. The value should be a vector of uint. But yes, in general you are right, the structs and and yaml to some extent have to be used in conjunction.

mshahneo · 2025-04-02T19:36:29Z

A general question on the structure - what's the benefit of maintaining both yaml and C++ definitions? Do we expect sth to reuse yaml separately from C++? Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

The C++ structure actually comes out of a necessity. To use yaml mapping utlity in LLVM, we have to have a C++ object mapped to it, hence the structures. And since we need the C++ structures anyway, I wanted to make some of them to be re-usable across uArchs.

Do we expect sth to reuse yaml separately from C++

Not really, not in this context anyway. But we wanted to keep the base structs in a such a way that they can be used in a non-LLVM cases, but that's more of hope not necessasity.

Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

I thought about this, but it takes me back to tablegen. Should we use tablegen?

adam-smnk · 2025-04-03T11:03:08Z

A general question on the structure - what's the benefit of maintaining both yaml and C++ definitions? Do we expect sth to reuse yaml separately from C++? Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

The C++ structure actually comes out of a necessity. To use yaml mapping utlity in LLVM, we have to have a C++ object mapped to it, hence the structures. And since we need the C++ structures anyway, I wanted to make some of them to be re-usable across uArchs.

Do we expect sth to reuse yaml separately from C++

Not really, not in this context anyway. But we wanted to keep the base structs in a such a way that they can be used in a non-LLVM cases, but that's more of hope not necessasity.

It feels like the overhead of this approach in its complexity and possible maintenance cost is not really justified. On its own it seems fine but these C++ bindings are not great. 😅
Overall, I'd say yaml is not really a first-class citizen in MLIR so it'd face extra scrutiny during upstreaming.

Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

I thought about this, but it takes me back to tablegen. Should we use tablegen?

It's worth a try if what we need easily translates to tablegen entries and structure. That is, if you find yourself at any point fighting against tablegen or in need to introduce custom "hacks", I would not necessarily double down on it.

I would suggest starting with user interfaces to flesh out what's really needed. Then it should become clearer where to store all the uarch details e.g., embed shared memory size directly in getShmemSize() or create a separate struct or pull it from a constexpr dictionary that contains everything etc.

Next quick test would be to see how well it all holds up for another target like battlemage.

mshahneo · 2025-04-04T21:59:45Z

A general question on the structure - what's the benefit of maintaining both yaml and C++ definitions? Do we expect sth to reuse yaml separately from C++? Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

The C++ structure actually comes out of a necessity. To use yaml mapping utlity in LLVM, we have to have a C++ object mapped to it, hence the structures. And since we need the C++ structures anyway, I wanted to make some of them to be re-usable across uArchs.

Do we expect sth to reuse yaml separately from C++

Not really, not in this context anyway. But we wanted to keep the base structs in a such a way that they can be used in a non-LLVM cases, but that's more of hope not necessasity.

It feels like the overhead of this approach in its complexity and possible maintenance cost is not really justified. On its own it seems fine but these C++ bindings are not great. 😅 Overall, I'd say yaml is not really a first-class citizen in MLIR so it'd face extra scrutiny during upstreaming.

Can the C++ bindings be autogenerated or will it be a lot of manual boilerplate?

I thought about this, but it takes me back to tablegen. Should we use tablegen?

It's worth a try if what we need easily translates to tablegen entries and structure. That is, if you find yourself at any point fighting against tablegen or in need to introduce custom "hacks", I would not necessarily double down on it.

I would suggest starting with user interfaces to flesh out what's really needed. Then it should become clearer where to store all the uarch details e.g., embed shared memory size directly in getShmemSize() or create a separate struct or pull it from a constexpr dictionary that contains everything etc.

Next quick test would be to see how well it all holds up for another target like battlemage.

Thanks, Adam. I think that's a good idea. Let me try to collect the uArch info battlemage. I looked into them during the design, but more details may help us finding the approach that would be easy to maintain.

llvm#138091) Check this error for more context (https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531) This fails with ``` * thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3) * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58 frame llvm#3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838 frame llvm#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13 frame llvm#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98 frame llvm#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102 frame llvm#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13 frame llvm#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919 ``` Problem : 1) The destructor currently handles no clearance for the DeviceParser and the DeviceAct. We currently only have this https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419 2) The ownership for DeviceCI currently is present in IncrementalCudaDeviceParser. But this should be similar to how the combination for hostCI, hostAction and hostParser are managed by the Interpreter. As on master the DeviceAct and DeviceParser are managed by the Interpreter but not DeviceCI. This is problematic because : IncrementalParser holds a Sema& which points into the DeviceCI. On master, DeviceCI is destroyed before the base class ~IncrementalParser() runs, causing Parser::reset() to access a dangling Sema (and as Sema holds a reference to Preprocessor which owns PragmaNamespace) we see this ``` * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 ```

Fix for: `Assertion failed: (false && "Architecture or OS not supported"), function CreateRegisterContextForFrame, file /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/elf-core/ThreadElfCore.cpp, line 182. PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace. #0 0x000000080cd857c8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x000000080cd85ed4 /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:797:3 #2 0x000000080cd82ae8 llvm::sys::RunSignalHandlers() /usr/src/contrib/llvm-project/llvm/lib/Support/Signals.cpp:104:5 llvm#3 0x000000080cd861f0 SignalHandler /usr/src/contrib/llvm-project/llvm/lib/Support/Unix/Signals.inc:403:3 llvm#4 0x000000080f159644 handle_signal /usr/src/lib/libthr/thread/thr_sig.c:298:3 `

The mcmodel=tiny memory model is only valid on ARM targets. While trying this on X86 compiler throws an internal error along with stack dump. llvm#125641 This patch resolves the issue. Reduced test case: ``` #include <stdio.h> int main( void ) { printf( "Hello, World!\n" ); return 0; } ``` ``` 0. Program arguments: /opt/compiler-explorer/clang-trunk/bin/clang++ -gdwarf-4 -g -o /app/output.s -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics -mcmodel=tiny <source> 1. <eof> parser at end of file #0 0x0000000003b10218 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b10218) #1 0x0000000003b0e35c llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b0e35c) #2 0x0000000003a5dbc3 llvm::CrashRecoveryContext::HandleExit(int) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dbc3) llvm#3 0x0000000003b05cfe llvm::sys::Process::Exit(int, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3b05cfe) llvm#4 0x0000000000d4e3eb LLVMErrorHandler(void*, char const*, bool) cc1_main.cpp:0:0 llvm#5 0x0000000003a67c93 llvm::report_fatal_error(llvm::Twine const&, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67c93) llvm#6 0x0000000003a67df8 (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a67df8) llvm#7 0x0000000002549148 llvm::X86TargetMachine::X86TargetMachine(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x2549148) llvm#8 0x00000000025491fc llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, std::optional<llvm::Reloc::Model>, std::optional<llvm::CodeModel::Model>, llvm::CodeGenOptLevel, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x25491fc) llvm#9 0x0000000003db74cc clang::emitBackendOutput(clang::CompilerInstance&, clang::CodeGenOptions&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3db74cc) llvm#10 0x0000000004460d95 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4460d95) llvm#11 0x00000000060005ec clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x60005ec) llvm#12 0x00000000044614b5 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44614b5) llvm#13 0x0000000004737121 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4737121) llvm#14 0x00000000046b777b clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x46b777b) llvm#15 0x00000000048229e3 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x48229e3) llvm#16 0x0000000000d50621 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd50621) llvm#17 0x0000000000d48e2d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 llvm#18 0x00000000044acc99 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::'lambda'()>(long) Job.cpp:0:0 llvm#19 0x0000000003a5dac3 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x3a5dac3) llvm#20 0x00000000044aceb9 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (.part.0) Job.cpp:0:0 llvm#21 0x00000000044710dd clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x44710dd) llvm#22 0x0000000004472071 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/compiler-explorer/clang-trunk/bin/clang+++0x4472071) llvm#23 0x000000000447c3fc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0x447c3fc) llvm#24 0x0000000000d4d2b1 clang_main(int, char**, llvm::ToolContext const&) (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd4d2b1) llvm#25 0x0000000000c12464 main (/opt/compiler-explorer/clang-trunk/bin/clang+++0xc12464) llvm#26 0x00007ae43b029d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90) llvm#27 0x00007ae43b029e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40) llvm#28 0x0000000000d488c5 _start (/opt/compiler-explorer/clang-trunk/bin/clang+++0xd488c5) ``` --------- Co-authored-by: Shashwathi N <[email protected]>

…142952) This was removed in llvm#135343 in favour of making it a format variable, which we do here. This follows the precedent of the `[opt]` and `[artificial]` markers. Before: ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 frame llvm#3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame llvm#4: 0x0000000186345be4 dyld`start + 7040 ``` After (note the `[inlined]` markers): ``` thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2 * frame #0: 0x000000010000037c a.out`inlined1() at inline.cpp:4:3 [inlined] frame #1: 0x000000010000037c a.out`regular() at inline.cpp:6:17 frame #2: 0x00000001000003b8 a.out`inlined2() at inline.cpp:7:43 [inlined] frame llvm#3: 0x00000001000003b4 a.out`main at inline.cpp:10:3 frame llvm#4: 0x0000000186345be4 dyld`start + 7040 ``` rdar://152642178

mshahneo · 2025-06-26T16:13:31Z

Hi Adam (@adam-smnk), Igor (@Garra1980 ),

Modified the uarch definition based on our discussion.
In this version the Pivot is the utilities which necessary instructions have to implement.
I also moved the information to C++ (as part of get functions directly instead of in yaml).

Please let me know what you think. I only added implementation for DPAS. Will add Load/store/prefetch if we agree on the process.

Thanks a lot in advance :).

P.S. I still kept some old code intentionally, if we want to go back or use some of it. I'll clean up once we agree on the process.

mshahneo · 2025-06-26T20:04:42Z

@silee2 , @nbpatel , @chencha3 , @charithaintc , @Jianhui-Li, @akroviakov
please let me know if you have any comments/suggestions.
Thank you so much in advance :)

Garra1980 · 2025-06-27T16:45:18Z

also cc @tkarna

tkarna · 2025-06-30T08:39:14Z

Do you think xegpu could at some point support the Data Layout and Target Information (DLTI) dialect for querying the uarch info?
FYI @rolfmorel

rolfmorel · 2025-06-30T11:11:47Z

Thanks for adding me, @tkarna ! Agreed, it would be great to collaborate on ways to expose this target info through DLTI.

I think @tkarna's work on performant schedules for Xe - which have plentiful magic values derivable from hw info - should be a good exemplar of a user of Xe's DLTI. Showing how to automatically derive those values through DLTI would be a great success indicator.

I had a quick attempt at what exposing this data through DLTI might look like (here using the transform dialect though you can also do the same queries from C++):

module attribute { #xevm.target<arch = "intel_gpu_pvc", ... = ?any other flags that vary across hardware?> } {
    ...
}
module {transform.with_named_sequence} {
    transform.named_sequence @__main__(%mod: !transform.any_op) {
        %number_of_eus_per_xe_core = transform.dlti.query ["xe_core","#eus"] at %mod  // #xevm.target answers 8
        %number_of_threads_per_eu = transform.dlti.query ["xe_core","eu","#threads"] at %mod  // #xevm.target answers [8, 4]
        
        %shared_memory_size = transform.dlti.query ["xe","shared_memory","size"] at %mod  // #xevm.target answers 524288
        %shared_memory_alignment = transform.dlti.query ["xe","shared_memory","alighment"] at %mod  // #xevm.target answers 64

        %grf_bitwidth = transform.dlti.query ["xe","grf","bitwidth"] at %mod  // #xevm.target answers 512
        %grf_modes = transform.dlti.query ["xe","grf","modes"] at %mod  // #xevm.target answers ["small", "large"]
        %grf_count_per_thread = transform.dlti.query ["xe","grf","mode","small","per_thread"] at %mod  // #xevm.target answers 128

        %dpas_opcode = transform.dlti.query ["xe","inst","dpas","opcode"] at %mod  // #xevm.target answers 0x83
        %dpas_systolic_depth = transform.dlti.query ["xe","inst","dpas","systolic_depth"] at %mod  // #xevm.target answers [8]
        %dpas_repeat_count = transform.dlti.query ["xe","inst","dpas","repeat_count"] at %mod  // #xevm.target answers 1,2,3,4,5,6,7,8 as multiple associations for single result handle OR [1, 8] OR #dlti.map<"min" = 1, "max" = 8> OR #dlti.range<1,2,...,8>/#tune.range<1,2,...,8> which would also respond to "min" and "max" queries,
        %dpas_ops_per_channel = transform.dlti.query ["xe","inst","dpas","ops_per_channel"] at %mod  // #xevm.target answers #dlti.map<19: 1, 16: 2, 8: 4, 4: 8, 2:8>
        
        %dpas_supported_types = transform.dlti.query ["xe","inst","dpas","supported_types"] at %mod  // #xevm.target answers #xevm.dpas_types so that ...
        %dpas_supported_types_B_wildcard = transform.dlti.query ["xe","inst","dpas","supported_types", f16, f16, f16, #dlti._] at %mod  // #xevm.dpas_types (yielded by #xevm.target which answered prefix) answers f16
        %dpas_supported_types_Dst_wildcard = transform.dlti.query ["xe","inst","dpas","supported_types", #dlti._, f16, f16, f16] at %mod  // #xevm.dpas_types (yielded by #xevm.target which answered prefix) answers f16,f32 (i.e. two type params associated to the handle)
        %dpas_supported_types_Dst_and_Acc_wildcard:2 = transform.dlti.query ["xe","inst","dpas","supported_types", #dlti._, #dlti._, f16, f16] at %mod  // #xevm.dpas_types (yielded by #xevm.target which answered prefix) answers f16,f32,f16,f32 for first result and f16,f16,f32,f32 for second result
        
        ... // transform ops would take the above Values as parameters
    }
}

That is, there's an #xevm.target attribute (or #xegpu.target) that handles most queries directly though it can also delegate to attributes like #xevm.dpas_types (in this case constructed by xevm.target) if so desired. This means there should be a XeTargetAttr implementing the DLTIQueryInterface's single query(..) method. We can discuss what's the best mechanism for implementing this method.

Not every query above is (easily) implemented with current upstream DLTI. I am working on a PR (oriented towards DLTI attributes that respond to queries by querying for target info and applying cost models) that will enable all the above that I will push as a draft this week.

mshahneo · 2025-06-30T15:10:24Z

@tkarna , yes, we would like to.
Current plan was to utilize DLTI only to query/expose the target triple.

As @rolfmorel pointed out, the current DLTI is too verbose and simple, @adam-smnk & I spoke about using DLTI directly, the issue was mapping all the hardware info in the DLTI attribute was making the IR way too verbose and big (e.g., think the .yaml file in the atttibute list :().

Thanks @rolfmorel for the detailed example. I remember you said that you envision that a query can essentially take a C++ function as key. That's why in the current iteration, I tried to make the utilities a first class citizen, this way, any ops that implement these interfaces can be queried via DLTI:

%dpas_supported_M_sizes = transform.dlti.query ["xe","inst","dpas","get_suppported_M", "bf16"] at %mod // #xevm.target answers the possible M sizes for type bf16...

This would essentially call that specific uArch's implementation of getSupportedM() function.

I'd like to discuss more with you all on this.

adam-smnk

I like the general direction 👍
The overall concept to have common descriptors (the uArch.h) which can have platform specific implementations (PVC, battlemage variants etc.) is great.

I haven't paid too close attention to exact implementation yet. However, it is a bit hard to really judge these helper APIs and their exact abstraction right now. I think it might be best to just start with a small prototype as soon as possible and just iterate on the go.

@chencha3 I think the on-going XeGPU verifier relaxation could great opportunity to run more hands-on experiments with this infrastructure. Perhaps a small uArch-aware validation pass for XeGPU or XeVM ops?

@tkarna If you could share what key hardware properties you already use for tuning or which further info could help you in that, it'd be super valuable feedback to help steer this design.

adam-smnk · 2025-06-30T16:03:12Z

mlir/include/mlir/Dialect/XeGPU/Utils/uArch.h

What exactly is the meaning of no_of_dims.
In this case, if dims.size() is different than no_of_dims then what happens?

You are right, we could just remove no_of_dims and get it from dims.size().
Maybe remove the class itself. Let me see.

mlir/include/mlir/Dialect/XeGPU/Utils/uArch.h

tkarna · 2025-07-01T13:44:58Z

@tkarna If you could share what key hardware properties you already use for tuning or which further info could help you in that, it'd be super valuable feedback to help steer this design.

In my preliminary tests I've use the following parameters for 4k matmul benchmark.

Fixed parameters:

DPAS tile sizes (n, m, k)

Tunable parameters:

WG tile (n, m); SG tile (n, m); K tile size
Prefetch tile sizes for A and B
Block (load) tile sizes for A and B

Implemented constrains: tile size must not exceed the parent tile size and divide parent tile evenly. SG tiling must not exceed max number of threads (32). Prefetch tile size must be compatible with the SG level tread configuration (coop prefetching).

There are other hardware constraints I have not explicitly expressed yet, for example maximum/supported load tile size and VNNI packing constraints for B. Some configs do spill registers and run slower.

mshahneo · 2025-07-01T15:29:15Z

I like the general direction 👍 The overall concept to have common descriptors (the uArch.h) which can have platform specific implementations (PVC, battlemage variants etc.) is great.

I haven't paid too close attention to exact implementation yet. However, it is a bit hard to really judge these helper APIs and their exact abstraction right now. I think it might be best to just start with a small prototype as soon as possible and just iterate on the go.

@chencha3 I think the on-going XeGPU verifier relaxation could great opportunity to run more hands-on experiments with this infrastructure. Perhaps a small uArch-aware validation pass for XeGPU or XeVM ops?

@tkarna If you could share what key hardware properties you already use for tuning or which further info could help you in that, it'd be super valuable feedback to help steer this design.

Thank you so much, @adam-smnk for the feedback :-) .
I am working on the prototype, I should be able push the changes today.

The function already exposes a work list to avoid deep recursion, this commit starts utilizing it in a helper that could also lead to a deep recursion. We have observed this crash on `clang/test/C/C99/n590.c` with our internal builds that enable aggressive optimizations and hit the limit earlier than default release builds of Clang. See the added test for an example with a deeper recursion that used to crash in upstream Clang before this change with the following stack trace: ``` #0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:804:13 #1 llvm::sys::RunSignalHandlers() /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Signals.cpp:106:18 #2 SignalHandler(int, siginfo_t*, void*) /usr/local/google/home/ibiryukov/code/llvm-project/llvm/lib/Support/Unix/Signals.inc:0:3 llvm#3 (/lib/x86_64-linux-gnu/libc.so.6+0x3fdf0) llvm#4 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12772:0 llvm#5 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#6 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#7 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#8 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#9 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#10 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#11 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#12 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#13 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#14 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#15 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#16 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 llvm#17 CheckCommaOperand /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:0:3 llvm#18 AnalyzeImplicitConversions /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12644:7 llvm#19 AnalyzeImplicitConversions(clang::Sema&, clang::Expr*, clang::SourceLocation, bool) /usr/local/google/home/ibiryukov/code/llvm-project/clang/lib/Sema/SemaChecking.cpp:12776:5 ... 700+ more stack frames. ```

Jianhui-Li · 2025-07-16T21:39:41Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+enum class InstructionType { SIMT, SIMD, SPMD, MIMD, Other };
+
+// An enum class to represent the scope of an instruction
+enum class InstructionScope {


What is the reason the instruction itself doesn't indicate the scope?

For user to know at what scope the instruction is native to (work-item, subgroup or workgroup)

Jianhui-Li · 2025-07-16T21:39:49Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+  Subgroup,
+  Workgroup,
+  Cluster,
+  Thread, // For CPU


XeGUP uArch includes enum for CPU?

Idea was to make is generic. Removed now.

Jianhui-Li · 2025-07-16T21:46:28Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+struct uArchHierarchyComponent {
+  std::string name = ""; // optional name of the hierarchy component
+  // no. of lower hierarchy component it contains, e.g., for PVC XeCore it
+  // contains 8 threads, so no_of_component=8


Here in the comment that threads means 8 EU Threads? Or we 8 EUs (each run 4 threads under large register mode)?

Jianhui-Li · 2025-07-16T21:54:57Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+  Vector, // 1-D vector
+  Matrix,
+  Tile,
+  Other


what is tile? avoid other in general - it mean very little to user.

Jianhui-Li · 2025-07-16T21:58:37Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+// The uArch includes:
+// - the name of the uArch,
+// - the description of the uArch,
+// - the range of tiles supported by the uArch,


I don't think there is such a uArch level block size support. These are always per instructions. 2dload, 2dstore, and DPAS operations support different block size.

Sorry, it was a mistake.

Jianhui-Li · 2025-07-16T23:07:11Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+  virtual ~TileOpInterface() = default;
+};
+
+enum class MatrixType { MatrixA, MatrixB, MatrixC, MatrixD };


MatrixType => MMAMatrixEnum

Jianhui-Li · 2025-07-16T23:08:17Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+// };
+
+// Create a TileLikeOp Interface
+struct TileOpInterface {


TileOpInterface => 2DBlockIOInterface

Jianhui-Li · 2025-07-16T23:09:41Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+
+enum class MatrixType { MatrixA, MatrixB, MatrixC, MatrixD };
+struct MatrixOpInterface {
+  virtual bool checkSupportedMMATypes(mlir::Type AType, mlir::Type BType,


missing interface for check whether combination of matrix shape is supported.

Jianhui-Li · 2025-07-16T23:10:59Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+  virtual std::vector<uint32_t> getSupportedK(mlir::Type type) = 0;
+  virtual std::vector<uint32_t> getSupportedN(mlir::Type type) = 0;
+  virtual std::vector<std::pair<unsigned, unsigned>>
+  getSupportedMatrix(mlir::Type type, MatrixType matrixType) = 0;


getSupportedMatrix => getSupportedShape

Jianhui-Li · 2025-07-16T23:23:00Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

+  // Can provide load/store/prefetch ops supported tile sizes for a specific
+  // uarch
+  virtual std::vector<std::vector<uint32_t>>
+  getSupportedTiles(mlir::Type type) = 0;


getSupportedTiles => getSupportedShape

Consider each interface has four basic functions (PVC):
getSupportedShape - used in blocking analysis to set the right xegpu_layout.inst_data parameter
CheckSupportedType - used in unrolling to verify input IR
CheckSupportedShapes - used in unrolling to inst_data size of output IR
Validate - used in XeGPU and XeVM op verification. most heavy one - check more than type/shape. can't guarantee if certain value is only available at runtime (base ptr dynamically-allocated, dynamic shape)

consider specific interface for : chunked load, 1dload.

Jianhui-Li · 2025-07-18T18:46:14Z

After reviewing this PR with Abadullah, my understanding is that the team developed a consensus that XeGPU uArch definition won't use DLTI, at least for now. @rolfmorel @adam-smnk @rengolin
I am suggesting Abadullah to clarify the uArch function interface to XeGPU passes, so that underneath implementation could switch to DLTI when it is possible. @Garra1980
Meanwhile, I have some clarification questions about the scope of DLTI. Isn't it intended to support uArch definition? If yes, what is the limitation and requirement to DLTI so that we could still converged to the same utility to access uArch info?

Changes: - Add common 5 APIs for all the interfaces. - Make OpInterfaces more specific (e.g., 2D, 1D Block IO have seperate interfaces). - Make the design more specific to Intel hardware. - Remove the information we don't use, i.e., make the design simpler

mshahneo · 2025-07-23T19:11:06Z

Hi @Jianhui-Li ,
Updated the PR based on our discussions.

Tracked at llvm#112294 This patch implements from [basic.link]p14 to [basic.link]p18 partially. The explicitly missing parts are: - Anything related to specializations. - Decide if a pointer is associated with a TU-local value at compile time. - [basic.link]p15.1.2 to decide if a type is TU-local. - Diagnose if TU-local functions from other TU are collected to the overload set. See [basic.link]p19, the call to 'h(N::A{});' in translation unit #2 There should be other implicitly missing parts as the wording uses "names" briefly several times. But to implement this precisely, we have to visit the whole AST, including Decls, Expression and Types, which may be harder to implement and be more time-consuming for compilation time. So I choose to implement the common parts. It won't be too bad to miss some cases since we DIDN'T do any such checks in the past 3 years. Any new check is an improvement. Given modules have been basically available since clang15 without such checks, it will be user unfriendly if we give a hard error now. And there are a lot of cases which violating the rule actually just fine. So I decide to emit it as warnings instead of hard errors.

Extend support in LLDB for WebAssembly. This PR adds a new Process plugin (ProcessWasm) that extends ProcessGDBRemote for WebAssembly targets. It adds support for WebAssembly's memory model with separate address spaces, and the ability to fetch the call stack from the WebAssembly runtime. I have tested this change with the WebAssembly Micro Runtime (WAMR, https://github.com/bytecodealliance/wasm-micro-runtime) which implements a GDB debug stub and supports the qWasmCallStack packet. ``` (lldb) process connect --plugin wasm connect://localhost:4567 Process 1 stopped * thread #1, name = 'nobody', stop reason = trace frame #0: 0x40000000000001ad wasm32_args.wasm`main: -> 0x40000000000001ad <+3>: global.get 0 0x40000000000001b3 <+9>: i32.const 16 0x40000000000001b5 <+11>: i32.sub 0x40000000000001b6 <+12>: local.set 0 (lldb) b add Breakpoint 1: where = wasm32_args.wasm`add + 28 at test.c:4:12, address = 0x400000000000019c (lldb) c Process 1 resuming Process 1 stopped * thread #1, name = 'nobody', stop reason = breakpoint 1.1 frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 1 int 2 add(int a, int b) 3 { -> 4 return a + b; 5 } 6 7 int (lldb) bt * thread #1, name = 'nobody', stop reason = breakpoint 1.1 * frame #0: 0x400000000000019c wasm32_args.wasm`add(a=<unavailable>, b=<unavailable>) at test.c:4:12 frame #1: 0x40000000000001e5 wasm32_args.wasm`main at test.c:12:12 frame #2: 0x40000000000001fe wasm32_args.wasm ``` This PR is based on an unmerged patch from Paolo Severini: https://reviews.llvm.org/D78801. I intentionally stuck to the foundations to keep this PR small. I have more PRs in the pipeline to support the other features/packets. My motivation for supporting Wasm is to support debugging Swift compiled to WebAssembly: https://www.swift.org/documentation/articles/wasm-getting-started.html

Jianhui-Li · 2025-07-31T04:36:12Z

mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp

+  Operation *op = getOperation();
+
+  // Use XeVM target
+  auto gpuModuleOp = op->getParentOfType<gpu::GPUModuleOp>();


can we wrap these code as an XeGPU utility function? say getXeGPUChipStr()

+1, I think it is worth to have a getXeArch() to hide defining PVCuArch and BMGuArch instantiation at the beginning. It sounds not scalable.

Pointers and GEP are untyped. SPIR-V required structured OpAccessChain. This means the backend will have to determine a good way to retrieve the structured access from an untyped GEP. This is not a trivial problem, and needs to be addressed to have a robust compiler. The issue is other workstreams relies on the access chain deduction to work. So we have 2 options: - pause all dependent work until we have a good chain deduction. - submit this limited fix to we can work on both this and other features in parallel. Choice we want to make is #2: submitting this **knowing this is not a good** fix. It only increase the number of patterns we can work with, thus allowing others to continue working on other parts of the backend. This patch as-is has many limitations: - If cannot robustly determine the depth of the structured access from a GEP. Fixing this would require looking ahead at the full GEP chain. - It cannot always figure out the correct access indices, especially with dynamic indices. This will require frontend collaboration. Because we know this is a temporary hack, this patch only impacts the logical SPIR-V target. Physical SPIR-V, which can rely on pointer cast remains on the old method. Related to llvm#145002

chencha3 · 2025-07-07T20:55:11Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

It should be IntelGpuXe2.h here.

chencha3 · 2025-07-07T21:07:01Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

Do Load and Store belong to different unit?

chencha3 · 2025-07-07T21:10:03Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArch.h

I am not sure whether it is ok and necessary to expose the opcode here.

chencha3 · 2025-08-06T20:51:17Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+// For example, a restriction that checks if the number of dimensions in a
+// std::vector<std::vector<uint32_t>> is 2 can be represented as:
+// std::vector<std::vector<uint32_t>> rt =
+// {{1, 32}, {2, 16}}; Restriction<std::vector<std::vector<uint32_t>>> r1(rt,


nit: a format issue? Happy to see the restriction abstraction.

chencha3 · 2025-08-06T20:53:25Z

mlir/include/mlir/Dialect/XeGPU/uArch/uArchBase.h

+  std::string name = ""; // optional name of the hierarchy component
+  // no. of lower hierarchy component it contains, e.g., for PVC XeCore it
+  // contains 8 threads, so no_of_component=8
+  uint32_t no_of_component;


nit: num_components? what is the major use of it?

chencha3 · 2025-08-06T21:10:09Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+namespace xegpu {
+namespace uArch {
+namespace Xe2Plus {
+struct XeCoreInfo {


can XeCoreInfo be used by other archs?

chencha3 · 2025-08-06T21:19:05Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+  // MemoryAccessType memory_access_type;
+  //   std::vector<std::string> supported_types;
+  std::vector<uint32_t> supported_types_bitwidth;
+  std::map<std::string, uint32_t> alignment;


what is the string used for in alingment? the type name?

chencha3 · 2025-08-06T21:22:12Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+  }
+};
+
+namespace PVCuArch {


nit: maybe we can remove PVCuArch namespace if it is not hard required.

chencha3 · 2025-08-06T21:24:09Z

mlir/include/mlir/Dialect/XeGPU/uArch/IntelGpuXe2.h

+              assert(tile.size() == 2);
+              return tile[1] * array_len *
+                         (dataType.getIntOrFloatBitWidth() / 8) <=
+                     64;


nit: format issue.
Is it also available for store2d, which doesn't have array length support in, e.g., pvc

chencha3 · 2025-08-06T21:33:29Z

mlir/lib/Dialect/XeGPU/IR/XeGPUOps.cpp

+  Operation *op = getOperation();
+
+  // Use XeVM target
+  auto gpuModuleOp = op->getParentOfType<gpu::GPUModuleOp>();


+1, I think it is worth to have a getXeArch() to hide defining PVCuArch and BMGuArch instantiation at the beginning. It sounds not scalable.

…lvm#152156) With this new A320 in-order core, we follow adding the FeatureUseFixedOverScalableIfEqualCost feature to A510 and A520 (llvm#132246), which reaps the same code generation benefits of preferring fixed over scalable when the cost is equal. So when we have: ``` void foo(float* a, float* b, float* dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` When compiling without the feature enabled, we get: ``` ... ld1b { z0.b }, p0/z, [x0, x10] ld1b { z2.b }, p0/z, [x1, x10] add x12, x0, x10 ldr z1, [x12, #1, mul vl] add x12, x1, x10 ldr z3, [x12, #1, mul vl] fadd z0.s, z2.s, z0.s add x12, x2, x10 fadd z1.s, z3.s, z1.s dech x11 st1b { z0.b }, p0, [x2, x10] incb x10, all, mul #2 str z1, [x12, #1, mul vl] ... ``` When compiling with, we get: ``` ... ldp q0, q1, [x12, #-16] ldp q2, q3, [x11, #-16] subs x13, x13, llvm#8 fadd v0.4s, v2.4s, v0.4s fadd v1.4s, v3.4s, v1.4s add x11, x11, llvm#32 add x12, x12, llvm#32 stp q0, q1, [x10, #-16] add x10, x10, llvm#32 ... ```

## Problem When the new setting ``` set target.parallel-module-load true ``` was added, lldb began fetching modules from the devices from multiple threads simultaneously. This caused crashes of lldb when debugging on android devices. The top of the stack in the crash look something like this: ``` #0 0x0000555aaf2b27fe llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/llvm/bin/lldb-dap+0xb87fe) #1 0x0000555aaf2b0a99 llvm::sys::RunSignalHandlers() (/opt/llvm/bin/lldb-dap+0xb6a99) #2 0x0000555aaf2b2fda SignalHandler(int, siginfo_t*, void*) (/opt/llvm/bin/lldb-dap+0xb8fda) llvm#3 0x00007f9c02444560 __restore_rt /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:13:0 llvm#4 0x00007f9c04ea7707 lldb_private::ConnectionFileDescriptor::Disconnect(lldb_private::Status*) (usr/bin/../lib/liblldb.so.15+0x22a7707) llvm#5 0x00007f9c04ea5b41 lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5b41) llvm#6 0x00007f9c04ea5c1e lldb_private::ConnectionFileDescriptor::~ConnectionFileDescriptor() (usr/bin/../lib/liblldb.so.15+0x22a5c1e) llvm#7 0x00007f9c052916ff lldb_private::platform_android::AdbClient::SyncService::Stat(lldb_private::FileSpec const&, unsigned int&, unsigned int&, unsigned int&) (usr/bin/../lib/liblldb.so.15+0x26916ff) llvm#8 0x00007f9c0528b9dc lldb_private::platform_android::PlatformAndroid::GetFile(lldb_private::FileSpec const&, lldb_private::FileSpec const&) (usr/bin/../lib/liblldb.so.15+0x268b9dc) ``` Our workaround was to set `set target.parallel-module-load ` to `false` to avoid the crash. ## Background PlatformAndroid creates two different classes with one stateful adb connection shared between the two -- one through AdbClient and another through AdbClient::SyncService. The connection management and state is complex, and seems to be responsible for the segfault we are seeing. The AdbClient code resets these connections at times, and re-establishes connections if they are not active. Similarly, PlatformAndroid caches its SyncService, which uses an AdbClient class, but the SyncService puts its connection into a different 'sync' state that is incompatible with a standard connection. ## Changes in this diff * This diff refactors the code to (hopefully) have clearer ownership of the connection, clearer separation of AdbClient and SyncService by making a new class for clearer separations of concerns, called AdbSyncService. * New unit tests are added * Additional logs were added (see llvm#145382 (comment) for details)

…namic (llvm#153420) Canonicalizing the following IR: ``` func.func @mul_zero_dynamic_nofold(%arg0: tensor<?x17xf32>) -> tensor<?x17xf32> { %0 = "tosa.const"() <{values = dense<0.000000e+00> : tensor<1x1xf32>}> : () -> tensor<1x1xf32> %1 = "tosa.const"() <{values = dense<0> : tensor<1xi8>}> : () -> tensor<1xi8> %2 = tosa.mul %arg0, %0, %1 : (tensor<?x17xf32>, tensor<1x1xf32>, tensor<1xi8>) -> tensor<?x17xf32> return %2 : tensor<?x17xf32> } ``` resulted in a crash ``` #0 0x000056513187e8db backtrace (./build-release/bin/mlir-opt+0x9d698db) #1 0x0000565131b17737 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:838:8 #2 0x0000565131b187f3 PrintStackTraceSignalHandler(void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:918:1 llvm#3 0x0000565131b18c30 llvm::sys::RunSignalHandlers() /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Signals.cpp:105:18 llvm#4 0x0000565131b18c30 SignalHandler(int, siginfo_t*, void*) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/llvm/lib/Support/Unix/Signals.inc:409:3 llvm#5 0x00007f2e4165b050 (/lib/x86_64-linux-gnu/libc.so.6+0x3c050) llvm#6 0x00007f2e416a9eec __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 llvm#7 0x00007f2e4165afb2 raise ./signal/../sysdeps/posix/raise.c:27:6 llvm#8 0x00007f2e41645472 abort ./stdlib/abort.c:81:7 llvm#9 0x00007f2e41645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9 llvm#10 0x00007f2e41653ec2 (/lib/x86_64-linux-gnu/libc.so.6+0x34ec2) llvm#11 0x00005651443ec4ba mlir::DenseIntOrFPElementsAttr::getRaw(mlir::ShapedType, llvm::ArrayRef<char>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:1361:3 llvm#12 0x00005651443f1209 mlir::DenseElementsAttr::resizeSplat(mlir::ShapedType) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/IR/BuiltinAttributes.cpp:0:10 llvm#13 0x000056513f76f2b6 mlir::tosa::MulOp::fold(mlir::tosa::MulOpGenericAdaptor<llvm::ArrayRef<mlir::Attribute>>) /local-ssd/sayans/Softwares/llvm-repo/llvm-project-latest/mlir/lib/Dialect/Tosa/IR/TosaCanonicalizations.cpp:0:0 ``` from the folder for `tosa::mul` since the zero value was being reshaped to `?x17` size which isn't supported. AFAIK, `tosa.const` requires all dimensions to be static. So in this case, the fix is to not to fold the op.

This can happen when JIT code is run, and we can't symbolize those frames, but they should remain numbered in the stack. An example spidermonkey trace: ``` #0 0x564ac90fb80f (/builds/worker/dist/bin/js+0x240e80f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #1 0x564ac9223a64 (/builds/worker/dist/bin/js+0x2536a64) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) #2 0x564ac922316f (/builds/worker/dist/bin/js+0x253616f) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) llvm#3 0x564ac9eac032 (/builds/worker/dist/bin/js+0x31bf032) (BuildId: 5d053c76aad4cfbd08259f8832e7ac78bbeeab58) llvm#4 0x0dec477ca22e (<unknown module>) ``` Without this change, the following symbolization is output: ``` #0 0x55a6d72f980f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3 #1 0x55a6d72f980f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5 #2 0x55a6d7421a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13 llvm#3 0x55a6d742116f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12 llvm#4 0x55a6d80aa032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10 llvm#4 0x2c803bd8f22e (<unknown module>) ``` The last frame has a duplicate number. With this change the numbering is correct: ``` #0 0x5620c58ec80f in MOZ_CrashSequence /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:248:3 #1 0x5620c58ec80f in Crash(JSContext*, unsigned int, JS::Value*) /builds/worker/checkouts/gecko/js/src/shell/js.cpp:4223:5 #2 0x5620c5a14a64 in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:501:13 llvm#3 0x5620c5a1416f in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) /builds/worker/checkouts/gecko/js/src/vm/Interpreter.cpp:597:12 llvm#4 0x5620c669d032 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /builds/worker/checkouts/gecko/js/src/jit/BaselineIC.cpp:1705:10 llvm#5 0x349f24c7022e (<unknown module>) ```

…build breakage from llvm#155943) (llvm#156103) ASan now detects dereferences of zero-sized allocations (llvm#155943; the corresponding MSan change is llvm#155944). This appears to have detected a bug in CrossOverTest.cpp, causing a buildbot breakage. This patch fixes the test. Buildbot report: https://lab.llvm.org/buildbot/#/builders/4/builds/8732 ``` 7: ==949882==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf169cfbe0010 at pc 0xb5f45efc6d1c bp 0xffffd933e460 sp 0xffffd933e458 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 8: READ of size 1 at 0xf169cfbe0010 thread T0 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9: #0 0xb5f45efc6d18 in LLVMFuzzerTestOneInput /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/test/fuzzer/CrossOverTest.cpp:48:7 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ check:20'1 ? possible intended match 10: #1 0xb5f45eec7288 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:619:13 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11: #2 0xb5f45eec85d4 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:812:3 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12: llvm#3 0xb5f45eec8c60 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:872:3 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13: llvm#4 0xb5f45eeb5c64 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:923:6 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 14: llvm#5 0xb5f45eee09d0 in main /home/tcwg-buildbot/worker/clang-aarch64-sve-vls-2stage/llvm/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10 check:20'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` For context, FuzzerLoop.cpp:812 tries empty input: ``` 810 // Test the callback with empty input and never try it again. 811 uint8_t dummy = 0; 812 ExecuteCallback(&dummy, 0); ```

Reverts llvm#154949 due to suspected buildbot breakage (https://lab.llvm.org/buildbot/#/builders/55/builds/16630/steps/11/logs/stdio). Previously commented on the original pull request: llvm#154949 (comment) ``` ******************** TEST 'MLIR :: Dialect/XeGPU/subgroup-distribute.mlir' FAILED ******************** ... # | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. # | Stack dump: # | 0. Program arguments: /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm_build_hwasan/bin/mlir-opt -xegpu-subgroup-distribute -allow-unregistered-dialect -canonicalize -cse -split-input-file /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/test/Dialect/XeGPU/subgroup-distribute.mlir # | #0 0x0000c0af4b066df0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:834:13 # | #1 0x0000c0af4b060e20 llvm::sys::RunSignalHandlers() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Signals.cpp:105:18 # | #2 0x0000c0af4b0691b4 SignalHandler(int, siginfo_t*, void*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/llvm/lib/Support/Unix/Signals.inc:426:38 # | llvm#3 0x0000ee25a3dcb8f8 (linux-vdso.so.1+0x8f8) # | llvm#4 0x0000ee25a36c7608 (/lib/aarch64-linux-gnu/libc.so.6+0x87608) # | llvm#5 0x0000ee25a367cb3c raise (/lib/aarch64-linux-gnu/libc.so.6+0x3cb3c) # | llvm#6 0x0000ee25a3667e00 abort (/lib/aarch64-linux-gnu/libc.so.6+0x27e00) # | llvm#7 0x0000c0af4ae7e4b0 __sanitizer::Atexit(void (*)()) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp:168:10 # | llvm#8 0x0000c0af4ae7c354 __sanitizer::Die() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:52:5 # | llvm#9 0x0000c0af4ae66a30 Unlock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:250:16 # | llvm#10 0x0000c0af4ae66a30 ~GenericScopedLock /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_mutex.h:386:51 # | llvm#11 0x0000c0af4ae66a30 __hwasan::ScopedReport::~ScopedReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:54:5 # | llvm#12 0x0000c0af4ae661b8 __hwasan::(anonymous namespace)::BaseReport::~BaseReport() /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:477:7 # | llvm#13 0x0000c0af4ae63f5c __hwasan::ReportTagMismatch(__sanitizer::StackTrace*, unsigned long, unsigned long, bool, bool, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan_report.cpp:1094:1 # | llvm#14 0x0000c0af4ae4f8e0 Destroy /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:532:31 # | llvm#15 0x0000c0af4ae4f8e0 ~InternalMmapVector /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/../sanitizer_common/sanitizer_common.h:642:56 # | llvm#16 0x0000c0af4ae4f8e0 __hwasan::HandleTagMismatch(__hwasan::AccessInfo, unsigned long, unsigned long, void*, unsigned long*) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:245:1 # | llvm#17 0x0000c0af4ae51e8c __hwasan_tag_mismatch4 /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/hwasan/hwasan.cpp:764:1 # | llvm#18 0x0000c0af4ae67b30 __interception::InterceptFunction(char const*, unsigned long*, unsigned long, unsigned long) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/compiler-rt/lib/interception/interception_linux.cpp:60:0 # | llvm#19 0x0000c0af5641cd24 getNumResults /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:404:37 # | llvm#20 0x0000c0af5641cd24 getOpResultImpl /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:1010:5 # | llvm#21 0x0000c0af5641cd24 getResult /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Operation.h:407:54 # | llvm#22 0x0000c0af5641cd24 mlir::OpTrait::detail::MultiResultTraitBase<mlir::gpu::WarpExecuteOnLane0Op, mlir::OpTrait::VariadicResults>::getResult(unsigned int) /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/OpDefinition.h:638:62 # | llvm#23 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:63:33 # | llvm#24 0x0000c0af56426b60 getType /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/include/mlir/IR/Value.h:105:39 # | llvm#25 0x0000c0af56426b60 (anonymous namespace)::LoadDistribution::matchAndRewrite(mlir::gpu::WarpExecuteOnLane0Op, mlir::PatternRewriter&) const /home/b/sanitizer-aarch64-linux-bootstrap-hwasan/build/llvm-project/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp:991:55 ... ```

A few improvements to logging when lldb-dap is started in **Server Mode** AND when the **`lldb-dap.logFolder`** setting is used (not `lldb-dap.log-path`). ### Improvement #1 **Avoid the prompt of restarting the server when starting each debug session.** That prompt is caused by the combination of the following facts: 1. The log filename changes every time a new debug session is starting (see [here](https://github.com/llvm/llvm-project/blob/9d6062c490548a5e6fea103e010ab3c9bc73a86d/lldb/tools/lldb-dap/src-ts/logging.ts#L47)) 2. The log filename is passed to the server via an environment variable called "LLDBDAP_LOG" (see [here](https://github.com/llvm/llvm-project/blob/9d6062c490548a5e6fea103e010ab3c9bc73a86d/lldb/tools/lldb-dap/src-ts/debug-adapter-factory.ts#L263-L269)) 3. All environment variables are put into the "spawn info" variable (see [here](https://github.com/llvm/llvm-project/blob/9d6062c490548a5e6fea103e010ab3c9bc73a86d/lldb/tools/lldb-dap/src-ts/lldb-dap-server.ts#L170-L172)). 4. The old and new "spawn info" are compared to decide if a prompt should show (see [here](https://github.com/llvm/llvm-project/blob/9d6062c490548a5e6fea103e010ab3c9bc73a86d/lldb/tools/lldb-dap/src-ts/lldb-dap-server.ts#L107-L110)). The fix is to remove the "LLDBDAP_LOG" from the "spawn info" variable, so that the same server can be reused if the log path is the only thing that has changed. ### Improvement #2 **Avoid log file conflict when multiple users share a machine and start server in the same second.** The problem: If two users start lldb-dap server in the same second, they will share the same log path. The first user will create the log file. The second user will find that they cannot access the same file, so their server will fail to start. The fix is to add a part of the VS Code session ID to the log filename. ### Improvement llvm#3 **Avoid restarting the server when the order of environment variables changed.** This is done by sorting the environment variables before putting them into the "spawn info".

Need this as `mlir/dialects/transform/smt.py` imports it: ```py from .._transform_smt_extension_ops_gen import * from .._transform_smt_extension_ops_gen import _Dialect ```

A recent change adding a new sanitizer kind (via Sanitizers.def) was reverted in c74fa20 ("Revert "[Clang][CodeGen] Introduce the AllocToken SanitizerKind" (llvm#162413)"). The reason was this ASan report, when running the test cases in clang/test/Preprocessor/print-header-json.c: ``` ==clang==483265==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7d82b97e8b58 at pc 0x562cd432231f bp 0x7fff3fad0850 sp 0x7fff3fad0848 READ of size 16 at 0x7d82b97e8b58 thread T0 #0 0x562cd432231e in __copy_non_overlapping_range<const unsigned long *, const unsigned long *> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2144:38 #1 0x562cd432231e in void std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__init_with_size[abi:nn220000]<unsigned long const*, unsigned long const*>(unsigned long const*, unsigned long const*, unsigned long) zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2685:18 #2 0x562cd41e2797 in __init<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:2673:3 llvm#3 0x562cd41e2797 in basic_string<const unsigned long *, 0> zorg-test/libcxx_install_asan_ubsan/include/c++/v1/string:1174:5 llvm#4 0x562cd41e2797 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) clang/lib/Serialization/ASTReader.cpp:10171:15 llvm#5 0x562cd41fd89a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, llvm::StringRef, bool, clang::ASTReaderListener&, bool) clang/lib/Serialization/ASTReader.cpp:6475:28 llvm#6 0x562cd41eea53 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3069:11 llvm#7 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#8 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#9 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#10 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#11 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#12 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#13 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#14 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#15 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#16 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#17 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#18 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] 0x7d82b97e8b58 is located 0 bytes after 3288-byte region [0x7d82b97e7e80,0x7d82b97e8b58) allocated by thread T0 here: #0 0x562cca76f604 in malloc zorg-test/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:67:3 #1 0x562cd1cce452 in safe_malloc llvm/include/llvm/Support/MemAlloc.h:26:18 #2 0x562cd1cce452 in llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) llvm/lib/Support/SmallVector.cpp:151:15 llvm#3 0x562cdbe1768b in grow_pod llvm/include/llvm/ADT/SmallVector.h:139:11 llvm#4 0x562cdbe1768b in grow llvm/include/llvm/ADT/SmallVector.h:525:41 llvm#5 0x562cdbe1768b in reserve llvm/include/llvm/ADT/SmallVector.h:665:13 llvm#6 0x562cdbe1768b in llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) llvm/lib/Bitstream/Reader/BitstreamReader.cpp:230:10 llvm#7 0x562cd41ee8ab in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, llvm::StringRef, unsigned int, bool, clang::ASTReaderListener&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) clang/lib/Serialization/ASTReader.cpp:3060:49 llvm#8 0x562cd4204ab8 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) clang/lib/Serialization/ASTReader.cpp:3249:15 llvm#9 0x562cd42097d2 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) clang/lib/Serialization/ASTReader.cpp:5182:15 llvm#10 0x562cd421ec77 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, clang::serialization::ModuleFile**) clang/lib/Serialization/ASTReader.cpp:4828:11 llvm#11 0x562cd3d07b74 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) clang/lib/Frontend/CompilerInstance.cpp:1805:27 llvm#12 0x562cd3d0b2ef in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<clang::IdentifierLoc>, clang::Module::NameVisibilityKind, bool) clang/lib/Frontend/CompilerInstance.cpp:1956:31 llvm#13 0x562cdb04eb1c in clang::Preprocessor::HandleHeaderIncludeOrImport(clang::SourceLocation, clang::Token&, clang::Token&, clang::SourceLocation, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2423:49 llvm#14 0x562cdb042222 in clang::Preprocessor::HandleIncludeDirective(clang::SourceLocation, clang::Token&, clang::detail::SearchDirIteratorImpl<true>, clang::FileEntry const*) clang/lib/Lex/PPDirectives.cpp:2101:17 llvm#15 0x562cdb043366 in clang::Preprocessor::HandleDirective(clang::Token&) clang/lib/Lex/PPDirectives.cpp:1338:14 llvm#16 0x562cdafa84bc in clang::Lexer::LexTokenInternal(clang::Token&, bool) clang/lib/Lex/Lexer.cpp:4512:7 llvm#17 0x562cdaf9f20b in clang::Lexer::Lex(clang::Token&) clang/lib/Lex/Lexer.cpp:3729:24 llvm#18 0x562cdb0d4ffa in clang::Preprocessor::Lex(clang::Token&) clang/lib/Lex/Preprocessor.cpp:896:11 llvm#19 0x562cd77da950 in clang::ParseAST(clang::Sema&, bool, bool) clang/lib/Parse/ParseAST.cpp:163:7 [...] SUMMARY: AddressSanitizer: heap-buffer-overflow clang/lib/Serialization/ASTReader.cpp:10171:15 in clang::ASTReader::ReadString(llvm::SmallVectorImpl<unsigned long> const&, unsigned int&) ``` The reason is this particular RUN line: ``` // RUN: env CC_PRINT_HEADERS_FORMAT=json CC_PRINT_HEADERS_FILTERING=direct-per-file CC_PRINT_HEADERS_FILE=%t.txt %clang -fsyntax-only -I %S/Inputs/print-header-json -isystem %S/Inputs/print-header-json/system -fmodules -fimplicit-module-maps -fmodules-cache-path=%t %s -o /dev/null ``` which was added in 8df194f ("[Clang] Support includes translated to module imports in -header-include-filtering=direct-per-file (llvm#156756)"). The problem is caused by an incremental build reusing stale cached module files (.pcm) that are no longer binary-compatible with the updated compiler. Adding a new sanitizer option altered the implicit binary layout of the serialized LangOptions data structure. The build + test system is oblivious to such changes. When the new compiler attempted to read the old module file (from the previous test invocation), it misinterpreted the data due to the layout mismatch, resulting in a heap-buffer-overflow. Unfortunately Clang's PCM format does not encode nor detect version mismatches here; a more graceful failure mode would be preferable. For now, fix the test to be more robust with incremental build + test.

**Mitigation for:** google/sanitizers#749 **Disclosure:** I'm not an ASan compiler expert yet (I'm trying to learn!), I primarily work in the runtime. Some of this PR was developed with the help of AI tools (primarily as a "fuzzy `grep` engine"), but I've manually refined and tested the output, and can speak for every line. In general, I used it only to orient myself and for "rubberducking". **Context:** The msvc ASan team (👋 ) has received an internal request to improve clang's exception handling under ASan for Windows. Namely, we're interested in **mitigating** this bug: google/sanitizers#749 To summarize, today, clang + ASan produces a false-positive error for this program: ```C++ #include <cstdio> #include <exception> int main() { try { throw std::exception("test"); }catch (const std::exception& ex){ puts(ex.what()); } return 0; } ``` The error reads as such: ``` C:\Users\dajusto\source\repros\upstream>type main.cpp #include <cstdio> #include <exception> int main() { try { throw std::exception("test"); }catch (const std::exception& ex){ puts(ex.what()); } return 0; } C:\Users\dajusto\source\repros\upstream>"C:\Users\dajusto\source\repos\llvm-project\build.runtimes\bin\clang.exe" -fsanitize=address -g -O0 main.cpp C:\Users\dajusto\source\repros\upstream>a.exe ================================================================= ==19112==ERROR: AddressSanitizer: access-violation on unknown address 0x000000000000 (pc 0x7ff72c7c11d9 bp 0x0080000ff960 sp 0x0080000fcf50 T0) ==19112==The signal is caused by a READ memory access. ==19112==Hint: address points to the zero page. #0 0x7ff72c7c11d8 in main C:\Users\dajusto\source\repros\upstream\main.cpp:8 #1 0x7ff72c7d479f in _CallSettingFrame C:\repos\msvc\src\vctools\crt\vcruntime\src\eh\amd64\handlers.asm:49 #2 0x7ff72c7c8944 in __FrameHandler3::CxxCallCatchBlock(struct _EXCEPTION_RECORD *) C:\repos\msvc\src\vctools\crt\vcruntime\src\eh\frame.cpp:1567 llvm#3 0x7ffb4a90e3e5 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18012e3e5) llvm#4 0x7ff72c7c1128 in main C:\Users\dajusto\source\repros\upstream\main.cpp:6 llvm#5 0x7ff72c7c33db in invoke_main C:\repos\msvc\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78 llvm#6 0x7ff72c7c33db in __scrt_common_main_seh C:\repos\msvc\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 llvm#7 0x7ffb49b05c06 (C:\WINDOWS\System32\KERNEL32.DLL+0x180035c06) llvm#8 0x7ffb4a8455ef (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800655ef) ==19112==Register values: rax = 0 rbx = 80000ff8e0 rcx = 27d76d00000 rdx = 80000ff8e0 rdi = 80000fdd50 rsi = 80000ff6a0 rbp = 80000ff960 rsp = 80000fcf50 r8 = 100 r9 = 19930520 r10 = 8000503a90 r11 = 80000fd540 r12 = 80000fd020 r13 = 0 r14 = 80000fdeb8 r15 = 0 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: access-violation C:\Users\dajusto\source\repros\upstream\main.cpp:8 in main ==19112==ABORTING ``` The root of the issue _appears to be_ that ASan's instrumentation is incompatible with Window's assumptions for instantiating `catch`-block's parameters (`ex` in the snippet above). The nitty gritty details are lost on me, but I understand that to make this work without loss of ASan coverage, a "serious" refactoring is needed. In the meantime, users risk false positive errors when pairing ASan + catch-block parameters on Windows. **To mitigate this** I think we should avoid instrumenting catch-block parameters on Windows. It appears to me this is as "simple" as marking catch block parameters as "uninteresting" in `AddressSanitizer::isInterestingAlloca`. My manual tests seem to confirm this. I believe this is strictly better than today's status quo, where the runtime generates false positives. Although we're now explicitly choosing to instrument less, the benefit is that now more programs can run with ASan without _funky_ macros that disable ASan on exception blocks. **This PR:** implements the mitigation above, and creates a simple new test for it. _Thanks!_ --------- Co-authored-by: Antonio Frighetto <[email protected]>

mshahneo force-pushed the uarch_definition branch from 4666e33 to 5dd0ebe Compare March 28, 2025 13:20

mshahneo requested review from chencha3 and nbpatel March 28, 2025 13:27

Garra1980 reviewed Mar 28, 2025

View reviewed changes

mshahneo requested a review from silee2 March 31, 2025 16:07

adam-smnk reviewed Apr 1, 2025

View reviewed changes

mshahneo changed the title ~~uArch definition (PR 1/N)~~ [XeGPU] uArch definition (PR 1/N) Jun 26, 2025

adam-smnk reviewed Jun 30, 2025

View reviewed changes

mshahneo force-pushed the uarch_definition branch 2 times, most recently from 042311e to 94ae51e Compare July 2, 2025 05:05

mshahneo changed the base branch from main to dummy_main July 2, 2025 05:08

mshahneo force-pushed the uarch_definition branch from 94ae51e to af7098b Compare July 2, 2025 05:11

mshahneo deleted the branch tmp_main July 2, 2025 05:12

Jianhui-Li reviewed Jul 16, 2025

View reviewed changes

mshahneo added 2 commits July 23, 2025 18:08

Add proper encapsulation.

5127729

Fix a minor comment.

6d971ff

Jianhui-Li reviewed Jul 31, 2025

View reviewed changes

chencha3 reviewed Aug 6, 2025

View reviewed changes

[XeGPU] uArch definition (PR 1/N) #2

Are you sure you want to change the base?

[XeGPU] uArch definition (PR 1/N) #2

Uh oh!

Conversation

mshahneo commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update 2:

Update:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adam-smnk commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshahneo commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam-smnk commented Apr 3, 2025

Uh oh!

mshahneo commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshahneo commented Jun 26, 2025

Uh oh!

mshahneo commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Garra1980 commented Jun 27, 2025

Uh oh!

tkarna commented Jun 30, 2025

Uh oh!

rolfmorel commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshahneo commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tkarna commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshahneo commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mshahneo commented Mar 28, 2025 •

edited

Loading

mshahneo commented Apr 2, 2025 •

edited

Loading

mshahneo commented Apr 4, 2025 •

edited

Loading

mshahneo commented Jun 26, 2025 •

edited

Loading

rolfmorel commented Jun 30, 2025 •

edited

Loading

mshahneo commented Jun 30, 2025 •

edited

Loading

tkarna commented Jul 1, 2025 •

edited

Loading

Jianhui-Li Jul 16, 2025 •

edited

Loading

Jianhui-Li Jul 16, 2025 •

edited

Loading

Jianhui-Li commented Jul 18, 2025 •

edited

Loading